Recombination dramatically speeds up evolution of finite populations 
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We study the role of recombination, as practiced by genetically-competent bacteria, in speeding 
up Darwinian evolution. This is done by adding a new process to a previously-studied Markov 
model of evolution on a smooth fitness landscape; this new process allows alleles to be exchanged 
with those in the surrounding medium. Our results, both numerical and analytic, indicate that for a 
wide range of intermediate population sizes, recombination dramatically speeds up the evolutionary 
advance. 



Recombination of genetic information is a common 
evolutionary strategy, both in natural systems |l| as well 
as in- vitro molecular breeding [31 ■ This idea is also em- 
ployed in genetic programming |3( , a branch of computer 
science which aims to evolve efficient algorithms. Given 
all this, it is surprising that we still do not have a good 
understanding of the conditions under which the benefits 
of recombination outweigh the inevitable costs. 

Of course, there is a large literature on recombina- 
tion, dating back to the ideas of Muller and Kn- 
odrashov One line of recent work focuses on two 
loci genomes and considers whether or not recombination 
would be favored; possible mechanisms include (weak) 
negative epistasis (the fact that the reproduction rate 
is not just the sum of the individual rates) or negative 
linkage disequilibrium (the lack of independence of the 
allele distribution in a finite population) or some com- 
bination thereof 6]. Others look at how the (static) ge- 
netic background in which a mutation arises will affect 
fixation probabilities ("clonal interference"), comparing 
these with or without recombination Q • In both of these 
methods, only one or two mutations at a time are "dy- 
namic" , a situation unlikely to be true for rapidly evolv- 
ing microorganisms. In contrast, our analysis considers 
a large number of contributing loci. 

In this paper, we study recombination in the context 
of a simple fitness landscape model @, 0, 0] which has 
proven useful in the analysis of laboratory scale evolu- 
tion of viruses and bacteria 0. The specific type of 
recombination we consider is based on the phenomenon 
of bacterial competence Jl'J . Here, bacteria can import 
snippets of DNA from the surrounding medium; presum- 
ably these are then homologously recombined so as to 
replace the corresponding segment in the genome. This 
behavior is controlled by of a cellular signaling system 
that ensures that recombination only occurs under stress. 
The details of the DNA importation and the aforemen- 
tioned control has convinced most biologists 0, ^| that 
competence is an important survival strategy for many 
bacterial species. 



Our model consists of a population of N individu- 
als each of which has a genome of L binary genes. 
An individual fitness depends additively on the genome 
x = Yli=i w ith S = 0, 1. Evolution is implemented as 
a continuous time Markov process in which individuals 
give birth at rate x and die at random so as to main- 
tain the fixed population size. Every birth allows for 
the daughter individual to mutate each of its alleles with 
probability [1q giving an overall genomic probability of 
fi = fi L. 

The last part of our Markov process concerns the afore- 
mentioned recombination. At rate f s L, an individual has 
one of its genes deleted and instead substitutes in a new 
allele from the surrounding medium; the probability of 
getting a specific S is just its proportional representation 
in the population. This mimics the competence mecha- 
nism as long as the distribution of recently deceased (and 
lysed) cells is close to that of the current population; this 
should be the case whenever the random killing due to 
a finite carrying capacity is the most common reason for 
death. In Fig. la, we show simulation results for the 
"velocity", i.e., the rate of fitness increase, (at one repre- 
sentative point on the landscape) for different 0(1) values 
of f s (the recombination probability per time per site), 
as a function of N. At very small population sizes, re- 
combination has little effect, since there is no population 
diversity upon which to act. Each of the curves rises sig- 
moidally to a saturation value at very large N which is 
again roughly independent of the recombination rate (see 
Fig lb and later). Because the population scale for this 
rise is a strongly decreasing function of f s , recombination 
at intermediate N can give a dramatic speedup of the 
evolution. It is worth mentioning that this basic result 
is qualitatively consistent with recent experiments [lfij in 
microorganism evolution which demonstrate an increase 
in the efficacy of recombination as the population size 
is increased (starting from small); we should note how- 
ever that the details of recombination in the experimen- 
tal systems arc different than those underlying bacterial 
competence. 
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FIG. 1: Velocity (averaged over 200 samples) measured be- 
tween x = 95 and x = 105 starting with an average fitness 
of 50. L = 200, jj, — 0.1. a) v measured as a function of N, 
for various f s . Error bars are shown for one value of f s , and 
are typical of all the data, b) v measured as a function of f 3 
for various N. As N increases, the velocity saturates at an 
/^-independent value 



Can one understand these simulation results? At small 
N, we can appeal to previous results for this model |i| 
that show that the population variance scales as fxN. One 
would therefore expect the small N breakpoint where 
the curves diverge to be roughly at N = \/[i; this is 
consistent with the data in Fig. la and we have checked 
this simple scaling with mutation rate (data not shown) . 
Another small N effect becomes evident if the simulations 
are extended to much larger f s values, as shown in Fig. 
lb. Now, the velocity begins a slow decline at too large 
f s , due to the recombination causing a loss of diversity 
as various sites get locked into specific alleles. 

The behavior at larger N, past the inflection point of 
the velocity curves in Fig la and in the rising segments 
of the curves in Fig lb, is much less trivial. To make 
progress, we start by assuming that the subpopulation at 
some particular fitness x has equal distributions at each 
site of the genome. This assumption means, of course, 
that selecting at random an allele at any site gives a 
chance x/L of getting 5 = 1 and 1 — x/L of getting 
5 = 0. Then, one can write down an equation for the 



infinite population size limit which directly determines 
the fitness distribution function, 
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The first two terms are standard and reflect the birth- 
death process and the genomic mutation respectively; the 
explicit form of the mutation term arises from consider- 
ing the probability of an individual with fitness x giving 
birth (rate ~ x), mutating (~ (i), and hence going cither 
up (~ (1 — x/L), the number of currently bad alleles) 
or down (~ x/L, the number of good alleles). The last 
term is new and reflects the role of recombination. With 
the aforementioned assumption, the probability that an 
individual of fitness x will have its fitness altered is pro- 
portional to the recombination rate, f s times the prob- 
ability of either: a) deleting a bad allele (1 — x/L) and 
picking up a good one (x/L); or b) deleting a good allete 
(x/L) and picking up a bad one (1 — x/L). 

Before using this equation (and its modification for 
finite N effects; see below) to analyze the numerical re- 
sults, we need to test the underlying equi-distribution 
assumption. To do this, we generated a population of 
N = 1000 at f s = 2, and let it evolve until reaching 
x = 75, for L = 100. We then measured the respec- 
tive probabilities for a recombination event to increase 
or decrease the fitness, based on the fitness x of the cho- 
sen individual. As shown in Fig. 2, our theoretical ex- 
pression has the correct functional dependence, although 
it overestimates these actual probabilities by roughly a 
fixed amount. This overestimate is due to the fact that 
individual sites have less diversity than is predicted, a 
remnant of the aforementioned loss-of-diversity effect. 
Notwithstanding the error (which we find decreases as 
N increases), this comparison gives us confidence that 
the above equation can account semi-quantitatively for 
the recombination process. 

In Fig. 3a, we show the results of solving Eq. (1) 
numerically for a variety of f s values. At non-zero f s , the 
fitness rapidly approaches a universal trajectory which is 
fs independent; only the rate of approach varies. Hence, 
the amount of recombination is of minor importance if 
A" is large enough for this mean-field theory to apply. 
We can explain this by noting that the recombination 
term on its own tends to make the population relax to a 
distribution that satisfies the equation 
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FIG. 2: The probability of making an up and down move due 
to recombination as a function of individual fitness, when the 
average fitness of the population was 75. For this run, N = 
1000, L = 100, f s = 1, fj, = 0.1. The "naive" probabilities are 
those derived assuming equal distribution of alleles at every 
locus. 
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FIG. 3: Velocity vs. average fitness for simulations of the non- 
cutoff MFE (Eq. for various f s . Parameters are fi = 0.1, 
L — 1000, initial fitness xo = 500. The curve for f a = oo is 
taken from Eq. 
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It is easy to verify that choosing P to be binomial, 
B(L,x/ L), satisfies this requirement. The evolutionary 
dynamics can then be determined by multiplying both 
sides of the MFE, Eq. QJ, by x and then summing over 
x, thereby computing the time derivative of x = pL. This 
yields 

V = p(l - p) + fi(p - |(p(l - p) + Lp 2 )) (3) 

Solving, we obtain for the case of initial p(Q) = 1/2 

l + /x(l-2/i) 
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The final state is reached in an 0(1) time and this indeed 
is quite rapid evolution; this analytic curve is included in 
Fig. 3. Now, the fact that recombination attempts to 
enforce a binomial distribution but otherwise does not 
directly change the rate of evolutionary advance explains 
why it has little consequence in the N — > oo mean-field 
limit. Essentially, the pure mutation-selection problem 
will, up to small corrections if L is large, also give rise to 
a binomial distribution which therefore self-consistently 
solves the entire equation. To see this, we replace the 
birth rate factors in the mutational part of the MFE by 
the constant rate x; this introduces an error of O(^f^), 
which becomes 0(L -1 / 2 ) were we to have a binomial dis- 
tribution. Then, we can directly check that the same bi- 
nomial anstaz solves the f s — time-dependent MFE, 
giving rise to p — p(l — p) + p.p{l — 2p); this agrees with 
the above equation for large L. Hence, the only role for 
recombination is to cause the system to dynamically se- 
lect this particular solution of the mean-field theory; the 
value of f s makes no difference, once we are past the 
transient period. 

We have now explained why the large N saturation 
value in Fig. la is roughly f s independent. The remain- 
ing issue concerns the critical value of fs(N) at which 
the system reaches the plateau (see Fig lb); the previous 
argument suggests that as N — > oo, /* — > + . This value 
is of crucial importance, as it represents the amount of re- 
combination needed for a finite population to achieve the 
maximal rate of evolution. Studying this requires inclu- 
sion of finite population effects in the evolution equation, 
for which we employ a heuristic cutoff approach which 
has been shown to be accurate in a variety of previous 
investigations 0,0]. In detail, we replace the first part 
of the mean- field equation (MFE) with the alternate form 



dP x {t) 
dt 



(x9(P x ~ P c ) - \)P x {t) 



(4) 



where A is chosen to satisfy population conservation 



A = J dxxP x 6(P x -P c ) 

and P c is a cutoff of order 1/N. Fig. 4a compares the 
time evolution of the stochastic system with that pre- 
dicted by the cutoff MFE, showing reasonable agreement. 
Finally, Fig 4b shows the desired effect, namely the fact 
that the transition point to rapid evolution is a decreas- 
ing function of In A''. 

Why does finite N matter in this manner? It is easy 
to check that the cutoff term has no consequential ef- 
fect as long as the distribution remains binomial. The 
real breakdown in the previous analysis occurs when N 
becomes small enough that the variance (and hence the 
rate of fitness advance) saturates at lnTV instead of L. 
This transition means that the mutation-selection bal- 
ance is not consistent with the binomial. The simplest 
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FIG. 4: a)Fitness as a function of time from an average over 
50 runs, N = 1000, compared to a numerical solution of the 
cutoff MFE with cutoff P c = 1/5000. jit = 0.1, f s = 2, L = 
50, xo ~ 25. b) Cutoff for which v is 90% of its maximal 
(cutoff=0) value as a function of f s . L = 200, fi = 0.1, 
Velocity measured between x = 95 and x — 105, with initial 
x — 50. Solid curve is the theoretical prediction, Eq. 

way to make an estimate of the critical N is to compare 
the calculated rate of mean fitness advance Lp based on 
the binomial distribution, with that to be expected when 
finite N effects are dominant. To estimate the latter, we 
notice that the recombination term can be thought of as 
containing both a drift piece and a diffusion piece 

+ [VP]' + [DP]" 

where ' refers to the finite difference operator and V x — 
z^-, D x = ^jt ~ The drift term is small, because 
x — x is a power of In TV which is assumed much less than 
L; hence, the most important effect is that of increased 
diffusion. This in fact appears to be the secret behind the 
efficacy of recombination in this model, namely that it 
acts to increase variation just like an increased mutation 
rate but without a mean drift term, aka the " mutational 
load" . The diffusion coefficient is finite as long as we are 
not near x — L. Assuming recombination dominates, we 
can use the results of previous analyses of the mutation- 
selection problem with f s L substituted for the genomic 
mutation rate [ix. From ref. [8| , the velocity under this 
assumption scales as 

v~U,L) 2 t*\*V*N 

Equating this to the previous velocity result, the pre- 
dicted critical value of f s at which the system crosses 
over to rapid evolution is predicted to scale as 

£1/2 

ft 2— (5) 
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This is consistent with the data shown in the figure and 
indeed with the limited direct simulation data in Fig. lb. 

At this stage of our understanding, it is impossible to 
make any quantitative contact with experimental data. 
Nonetheless, conceptual insights that emerge from our 
study seem to offer solutions for some of the mysteries 
underlying bacterial competence. Our results show that 
in the population range of interest for many microorgan- 
ism colonies, there is a huge potential benefit to be gained 



from recombination; nevertheless too much recombina- 
tion can hurt, as the specific genes are too rapidly driven 
to the most common allele even if it is not the benefi- 
cial one. This perhaps explains why recombination is so 
heavily regulated via intercellular signaling. The mecha- 
nism behind this benefit seems to be the increased rate 
of effective diffusion on the landscape, similar to what 
would happen with an increased mutation rate except 
that there is no significant extra load. Finally, we have 
already mentioned that our results are consistent with 
recent experiments; these could be extended to check the 
basic prediction of our approach regarding the scaling 
of the needed rate versus population size (eq. |3J). Even 
more exciting would be the determination that the signal- 
ing system is used for imposition of this result, measuring 
the effective population by quorum sensing and feeding 
the information into the competence pathway. 
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